Extending logic to add_csv_partitions and leveraging catalog_table_input #674

jaidisido · 2021-05-03T20:56:14Z

Issue #, if available:
#672

Description of changes:
Added some enhancements to previous PR #673

Leverage the catalog_table_input details when available to extract serde info from existing catalog tables
Extend the logic to the add_csv_partitions method

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jaidisido · 2021-05-03T21:10:13Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: 09ec000
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-05-03T20:57:30Z

awswrangler/catalog/_definitions.py

+    serde_info = {
+        "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
+        if serde_library is None
+        else serde_library,
+        "Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters,
+    }


I believe we missed this method in the previous PR

@maxispeicher, apologies, I am not sure if I already tagged you to this PR or not for review?

I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.

jaidisido · 2021-05-03T20:59:19Z

awswrangler/s3/_write_text.py

+                serde_info: Dict[str, Any] = {}
+                if catalog_table_input:
+                    serde_info = catalog_table_input["StorageDescriptor"]["SerdeInfo"]
+                serde_library: Optional[str] = serde_info.get("SerializationLibrary", None)
+                serde_parameters: Optional[Dict[str, str]] = serde_info.get("Parameters", None)


In the to_csv method, instead of passing None to the create_ csv_table and add_csv_partitions method, we extract the serde info from the existing catalog table if it exists

jaidisido · 2021-05-03T20:59:49Z

tests/test_athena_csv.py

+    wr.catalog.add_csv_partitions(
+        database=glue_database,
+        table=glue_table,
+        partitions_values=response["partitions_values"],


Extended the test to include add_csv_partitions

jaidisido · 2021-05-03T22:11:29Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: 979f728
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

maxispeicher

Looks good to me. Thank you for adding the missing parts 🙂

maxispeicher · 2021-05-05T18:47:12Z

awswrangler/catalog/_definitions.py

+    serde_info = {
+        "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
+        if serde_library is None
+        else serde_library,
+        "Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters,
+    }


I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.

Extending logic to add_csv_partitions and leveraging catalog_table_input

09ec000

Adapting catalog versioning test

979f728

jaidisido commented May 3, 2021

View reviewed changes

maxispeicher approved these changes May 5, 2021

View reviewed changes

jaidisido merged commit dff4aa6 into main May 5, 2021

jaidisido deleted the feat-672-add-serde-info branch May 5, 2021 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extending logic to add_csv_partitions and leveraging catalog_table_input #674

Extending logic to add_csv_partitions and leveraging catalog_table_input #674

Uh oh!

jaidisido commented May 3, 2021

Uh oh!

jaidisido commented May 3, 2021

Uh oh!

jaidisido May 3, 2021

Uh oh!

jaidisido May 5, 2021

Uh oh!

maxispeicher May 5, 2021

Uh oh!

jaidisido May 3, 2021

Uh oh!

jaidisido May 3, 2021

Uh oh!

jaidisido commented May 3, 2021

Uh oh!

maxispeicher left a comment

Uh oh!

maxispeicher May 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Extending logic to add_csv_partitions and leveraging catalog_table_input #674

Extending logic to add_csv_partitions and leveraging catalog_table_input #674

Uh oh!

Conversation

jaidisido commented May 3, 2021

Uh oh!

jaidisido commented May 3, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido May 3, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 5, 2021

Choose a reason for hiding this comment

Uh oh!

maxispeicher May 5, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 3, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 3, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido commented May 3, 2021

AWS CodeBuild CI Report

Uh oh!

maxispeicher left a comment

Choose a reason for hiding this comment

Uh oh!

maxispeicher May 5, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants